Querying Complex Structured Databases
نویسندگان
چکیده
Correctly generating a structured query (e.g., an XQuery or a SQL query) requires the user to have a full understanding of the database schema, which can be a daunting task. Alternative query models have been proposed to give users the ability to query the database without schema knowledge. Those models, including simple keyword search and labeled keyword search, aim to extract meaningful data fragments that match the structure-free query conditions (e.g., keywords) based on various matching semantics. Typically, the matching semantics are content-based: they are defined on data node inter-relationships and incur significant query evaluation cost. Our first contribution is a novel matching semantics based on analyzing the database schema. We show that query models employing a schema-based matching semantics can reduce query evaluation cost significantly while maintaining or even improving result quality. The adoption of schema-based matching semantics does not change the nature of those query models: they are still schema-ignorant, i.e., users express no schema knowledge (except the labels in labeled keyword search) in the query. While those models work well for some queries on some databases, they often encounter problems when applied to complex queries on databases with complex schemas. Our second contribution is a novel query model that incorporates partial schema knowledge through the use of schema summary. This new summary-aware query model, called Meaningful Summary Query (MSQ), seamlessly integrates summary-based structural conditions and structure-free conditions, and enables ordinary users to query complex databases. We design algorithms for evaluating MSQ queries, and demonstrate that MSQ queries can produce better results against complex databases when compared with previous approaches, and that they can be efficiently evaluated.
منابع مشابه
Querying and computing with BioCyc databases
We describe multiple methods for accessing and querying the complex and integrated cellular data in the BioCyc family of databases: access through multiple file formats, access through Application Program Interfaces (APIs) for LISP, Perl and Java, and SQL access through the BioWarehouse relational database.
متن کاملPESTO : An Integrated Query/Browser for Object Databases
This paper describes the design and implementation of PEST0 (Portable Explorer of Snuctured Objects), a user interface that supports browsing and querying of object databases. PEST0 allows users to navigate the relationships that exist among objects. In addition, users can formulate complex object queries through an integrated query paradigm (“query-in-place”) that presents querying as a natura...
متن کاملAn Approach of SQL to JSON Transformation For Handling Database Operations
Nowadays NOSQL databases are becoming more popular. Companies like Google, Facebook, and Amazon has created their own NOSQL databases based on their requirements. Different types of querying approaches are followed by different NOSQL databases, whereas traditional databases like MySQL, ORACLE, etc. follows SQL for querying. Most of the companies are shifting from traditional databases to NOSQL ...
متن کاملApproximating Retrieval Unit for Un-structured Querying in Relational Databases
In recent years, there has been a significant interest in enabling keyword search capability, which is un-structured querying, for relational databases. Existing approaches return a set of tuples joined together as a retrieval unit. However, the joined tuples possibly involve data of the irrelevant or lessinteresting attributes for query keywords when the keywords are matched in multiple relati...
متن کاملStandardizing the Querying Process with SGML The SQL DTD
One of the most exciting applications of SGML which has emerged in the recent years is its use in document databases. The structural information embedded in SGML documents makes it possible to query SGML documents and extract information in an automatic manner; however, this querying process has not been standardized. As a result, different SGML database implementations use their own query lang...
متن کاملThe Role of Declarative Querying in Bioinformatics
THE RECENT PUBLICATION of a draft of the entire human genome (McPherson et al., 2001; Venter et al., 2001) has served to fuel an already explosive area of research in bioinformatics that is involved in deriving meaningful knowledge from proteins and DNA sequences (Alberts et al., 2002). Even with the full human genome sequence now in hand, scientists still face the challenges of determining exa...
متن کامل